Retrieving Lexical Semantics from Multilingual Corpora
نویسندگان
چکیده
This paper presents a technique to build a lexical resource used for annotation of parallel corpora where the tags can be seen as multilingual ‘synsets’. The approach can be extended to add relationships between these synsets that are akin to WordNet relationships of synonymy and hypernymy. The paper also discusses how the success of this approach can be measured. The reported results are for English, German, French, and Greek using the Europarl parallel corpus.
منابع مشابه
Spoken to Spoken vs. Spoken to Written: Corpus Approach to Exploring Interpreting and Subtitling
issue of Polibits includes a selection of papers related to the topic of processing of semantic information. Processing of semantic information involves usage of methods and technologies that help machines to understand the meaning of information. These methods automatically perform analysis, extraction, generation, interpretation, and annotation of information contained on the Web, corpus, nat...
متن کاملCrosslingual and Multilingual Construction of Syntax-Based Vector Space Models
Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However, their construction requires large, accurately parsed corpora, which are unavailable for most languages. In this paper, we develop a number of methods to overcome this obstacle. We describe (a) a crosslingual approach that constructs a synta...
متن کاملStandards & best practice for multilingual computational lexicons: ISLE MILE and more
ISLE (International Standards for Language Engineering) is a transatlantic standards oriented initiative under the Human Language Technology (HLT) programme within the EU-US International Research Co-operation. It is a continuation of the European EAGLES (Expert Advisory Group for Language Engineering Standards) initiative, carried out through a number of subsequent projects funded by the Europ...
متن کاملUnsupervised Construction of a Multilingual WordNet from Parallel Corpora
This paper outlines an approach to the unsupervised construction from unannotated parallel corpora of a lexical semantic resource akin to WordNet. The paper also describes how this resource can be used to add lexical semantic tags to the text corpus at hand. Finally, we discuss the possibility to add some of the predicates typical for WordNet to its automatically constructed multilingual versio...
متن کاملPredicting Lexical Relations between Biomedical Terms: towards a Multilingual Morphosemantics-based System
This paper addresses the issue of how semantic information can be automatically assigned to compound terms, i.e. both a definition and a set of semantic relations. This issue is particularly crucial when elaborating multilingual databases and when developing cross-language information retrieval systems. The paper shows how morpho-semantics can contribute in the constitution of multilingual lexi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Polibits
دوره 41 شماره
صفحات -
تاریخ انتشار 2010